28 research outputs found

    Accelerating Object-Sensitive Pointer Analysis by Exploiting Object Containment and Reachability (Artifact)

    Get PDF

    MLGOPerf: An ML Guided Inliner to Optimize Performance

    Full text link
    For the past 25 years, we have witnessed an extensive application of Machine Learning to the Compiler space; the selection and the phase-ordering problem. However, limited works have been upstreamed into the state-of-the-art compilers, i.e., LLVM, to seamlessly integrate the former into the optimization pipeline of a compiler to be readily deployed by the user. MLGO was among the first of such projects and it only strives to reduce the code size of a binary with an ML-based Inliner using Reinforcement Learning. This paper presents MLGOPerf; the first end-to-end framework capable of optimizing performance using LLVM's ML-Inliner. It employs a secondary ML model to generate rewards used for training a retargeted Reinforcement learning agent, previously used as the primary model by MLGO. It does so by predicting the post-inlining speedup of a function under analysis and it enables a fast training framework for the primary model which otherwise wouldn't be practical. The experimental results show MLGOPerf is able to gain up to 1.8% and 2.2% with respect to LLVM's optimization at O3 when trained for performance on SPEC CPU2006 and Cbench benchmarks, respectively. Furthermore, the proposed approach provides up to 26% increased opportunities to autotune code regions for our benchmarks which can be translated into an additional 3.7% speedup value.Comment: Version 2: Added the missing Table 6. The short version of this work is accepted at ACM/IEEE CASES 202

    Virtual Machine Support for Many-Core Architectures: Decoupling Abstract from Concrete Concurrency Models

    Get PDF
    The upcoming many-core architectures require software developers to exploit concurrency to utilize available computational power. Today's high-level language virtual machines (VMs), which are a cornerstone of software development, do not provide sufficient abstraction for concurrency concepts. We analyze concrete and abstract concurrency models and identify the challenges they impose for VMs. To provide sufficient concurrency support in VMs, we propose to integrate concurrency operations into VM instruction sets. Since there will always be VMs optimized for special purposes, our goal is to develop a methodology to design instruction sets with concurrency support. Therefore, we also propose a list of trade-offs that have to be investigated to advise the design of such instruction sets. As a first experiment, we implemented one instruction set extension for shared memory and one for non-shared memory concurrency. From our experimental results, we derived a list of requirements for a full-grown experimental environment for further research

    Accelerating Object-Sensitive Pointer Analysis by Exploiting Object Containment and Reachability

    Get PDF

    Speculative Parallelism Improves Search?

    No full text
    The extreme efficiency of sequential search, and the natural tendency of tree pruning systems to produce wide variations in workload, partly explains why it is proving difficult to achieve more than 30-50 % efficiency for massively parallel implementations of the; algorithm. Here we introduce typical enhanced sequential algorithms and address the major issues of parallel game-tree searching under conditions of severe pruning. It is this pruning that makes the parallelization difficult. After examining previous work on parallel; algorithms, we present a new method called Dynamic Multiple Principal Variation Splitting (DM-PVSplit) and implement it on the AP1000. In this algorithm, high performance is achieved by using some novel approaches: Parallel speculative search of candidate principal variations is used to reduce re-search delay and so obtain more quickly a better estimate of the subtree value. This is achieved by configuring a at processor arrangement as a dynamically changeable tree structure. Also, with the aid of a group-based scheduling strategy, the game tree is split dynamically at different levels. This provides better load balance and takes more advantage of parallelism. Preliminary experiments show that the scalability of the DM-PVSplit algorithm is good for massively parallel machines

    Multithreaded Pruned Tree Search In Distributed Systems

    No full text
    Although efficient support for data-parallel applications is relatively well established, it remains open how well to support irregular and dynamic problems where there are no regular data structures and communication patterns. Tree search is central to solving a variety of problems in artificial intelligence and an important subset of the irregular applications where tasks are frequently created and terminated. In this paper, we introduce the design of a multithreaded distributed runtime system. Efficiency and ease of parallel programming are the two primary goals. In our system, multithreading is used to specify the asynchronous behavior in parallel game tree search, and dynamic load balancing is employed for efficient performance

    IoP System Dependability Evaluation Method Based on AADL

    No full text
    The Internet of People(IoP)is characterized by the complex architecture and massive changing data, which adds to the difficulty of the analysis on IoP-based system dependability.Currently, there is still no robust dependability modelling and analysis method for IoP systems. This paper proposes an Architecture Analysis and Design Language (AADL)-based dependability evaluation method for IoP systems. By using AADL and its annex language, the dependability of IoP systems is modeled to support the qualitative analysis on the causes of system failures and risks. Furthermore, by combining the Ocarina model transformation technology, a quantitative evaluation algorithm based on the Continuous-Time Markov Chain(CTMC)is proposed. The algorithm transforms the AADL dependability model to the CTMC model, so that the dynamic and real-time attributes of IoP systems can be evaluated quantitatively. On this basis, a general IoP system model is designed to demonstrate the feasibility of the proposed method. The experimental results show that the proposed method can be used to model the IoP systems, and perform dependability analysis automatically and accurately, displaying a high application value

    Delta Send-Recv for Dynamic Pipelining in MPI Programs

    No full text
    Abstract—Pipelining is necessary for efficient do-across parallelism but the use is difficult to automate because it requires send-receive analysis and loop blocking in both sender and receiver code. The blocking factor is statically chosen. This paper presents a new interface called delta sendrecv. Through compiler and run-time support, it enables dynamic pipelining. In program code, the interface is used to mark the related computation and communication. There is no need to restructure the computation code or compose multiple messages. At run time, the message size is dynamically determined, and multiple pipelines are chained among all tasks that participate in the delta communication. The new system is tested on kernel and reduced NAS benchmarks to show that it simplifies message-passing programming and improves program performance. Keywords-MPI; communication-computation overlapping; dynamic pipelining I
    corecore